Support auto-allocate VLAN for SubnetConnectionBindingMap by wenqiq · Pull Request #1445 · vmware-tanzu/nsx-operator

wenqiq · 2026-06-02T08:55:24Z

When spec.vlanTrafficTag is not set, the subnetbinding controller allocates a VLAN from [1, 4094], writes it to spec, and realizes the NSX binding map. Used VLANs are discovered via NSX search on parent Subnet paths so bindings created outside the Supervisor are included.

TestDone:

Ensure the Testbed has the latest nsx-operator deployed with the VLAN auto-allocation feature.
before deploy the new manager:

Create a YAML file test-vlan-auto.yaml containing a parent subnet, two child subnets, and a binding map:

apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  name: parent-subnet
  namespace: default
spec:
  ipv4SubnetSize: 16
---
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  name: child-subnet-1
  namespace: default
spec:
  ipv4SubnetSize: 16
---
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  name: child-subnet-2
  namespace: default
spec:
  ipv4SubnetSize: 16
---
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: SubnetConnectionBindingMap
metadata:
  name: binding-auto-1
  namespace: default
spec:
  subnetName: child-subnet-1
  targetSubnetName: parent-subnet
  # Intentionally omit vlanTrafficTag to trigger auto-allocation

Verify Auto-allocation

Run kubectl apply -f test-vlan-auto.yaml.
Wait for the resources to be ready: kubectl get subnetconnectionbindingmap binding-auto-1.
Check the allocation result: kubectl get subnetconnectionbindingmap binding-auto-1 -o yaml.
Expected Result: spec.vlanTrafficTag is automatically injected (e.g., 0, 100, or another value within the [0, 4094] range).

codecov-commenter · 2026-06-02T11:24:42Z

Codecov Report

❌ Patch coverage is 66.21005% with 74 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.93%. Comparing base (c44e899) to head (929221e).

Files with missing lines	Patch %	Lines
pkg/nsx/services/subnetbinding/vlan_allocation.go	63.15%	23 Missing and 5 partials ⚠️
...trollers/subnetbinding/subnetbinding_controller.go	61.42%	23 Missing and 4 partials ⚠️
pkg/nsx/util/utils.go	0.00%	10 Missing ⚠️
pkg/nsx/services/vlanpool/vlanpool.go	85.00%	5 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1445      +/-   ##
==========================================
- Coverage   78.05%   77.93%   -0.13%     
==========================================
  Files         186      188       +2     
  Lines       25171    25382     +211     
==========================================
+ Hits        19648    19782     +134     
- Misses       4204     4267      +63     
- Partials     1319     1333      +14

Flag	Coverage Δ
unit-tests	`77.93% <66.21%> (-0.13%)`	⬇️

Files with missing lines	Coverage Δ
pkg/nsx/services/subnetbinding/builder.go	`93.18% <100.00%> (ø)`
pkg/nsx/services/subnetbinding/subnetbinding.go	`92.07% <100.00%> (ø)`
pkg/nsx/services/vlanpool/vlanpool.go	`85.00% <85.00%> (ø)`
pkg/nsx/util/utils.go	`86.31% <0.00%> (-1.86%)`	⬇️
...trollers/subnetbinding/subnetbinding_controller.go	`73.69% <61.42%> (-3.42%)`	⬇️
pkg/nsx/services/subnetbinding/vlan_allocation.go	`63.15% <63.15%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dantingl

Could you add more local test cases that some existing NSX BindingMap already allocate some VLAN ID

wenyingd · 2026-06-04T10:17:38Z

+		}
+	}
+
+	if err := r.patchSpecVlanTrafficTag(ctx, bindingMap, vlan); err != nil {


Firstly, it is not suggested to let the reconciler update SubnetConnectionBindingMap spec, which is possible to break user's intent, e.g., we may not differentiate if this VLAN is set by user or by nsx-operator. Updating spec is different from updating status. So I would prefer to introduce a field in SubnetConnectionBindingMap.status to record the final VLAN, no matter it is from user's intent (spec) or from the auto-calculation of nsx-operator.

Secondly, the current implementation is 1) patch the CR first, and then 2) continue the reconciliation (sync to nsx subnet-connection-binding-map) without a re-queue. It makes the reconciler usage mixed. As we have patched the CR, another update event will be triggered, in which the CR could have all required configurations. So I would prefer to split the "VLAN allocation" and "nsx resource create/update" into two independenty reconciliations, e.g., 1) if VLAN is not specified in one event handling cycle, the reconciler only allocates VLAN, 2) if everything is ready, do the reconciliation to sync to NSX.

Even more, if we must go with the route to patch the CR spec rather than introducing a field in the status, what do you think to use webhook for allocating VLAN?

Personally, I prefer to use the status to show the auto-allcated VLAN rather than patching spec..

Thanks for reviewing. I've updated the implementation accordingly.

On splitting reconciliation: Agreed. VLAN allocation and NSX sync are now handled in separate reconcile cycles. When vlanTrafficTag is omitted, the reconciler only allocates a VLAN, persists it to the CR spec, and returns. A subsequent reconcile (triggered by the spec update) validates the VLAN and syncs to NSX.

On webhook-based allocation: I don't think a mutating webhook is a good fit here. VLAN allocation depends on dependent Subnets being ready and NSX state (cache/NSX query, pending pool, overlap retry), which fits the controller lifecycle better than admission.

On persisting to spec vs. status: We still persist the auto-allocated VLAN in spec.vlanTrafficTag (The API definition still needs to be confirmed by @jianjuns). To distinguish operator-allocated vs. user-provided VLANs after persistence, we set annotation nsx.vmware.com/auto_allocated_vlan_traffic_tag when writing the spec. This keeps pending VLAN pool commit/release correct in the second reconcile cycle.

wenyingd · 2026-06-04T10:44:21Z

+func (r *Reconciler) reconcileVlanTrafficTag(ctx context.Context, bindingMap *v1alpha1.SubnetConnectionBindingMap, parentSubnetPaths []string) *errorWithRetry {
+	if bindingMap.Spec.HasVlanTrafficTag() {
+		vlan := *bindingMap.Spec.VLANTrafficTag
+		if err := r.VlanPoolService.ValidateManualVlan(parentSubnetPaths, vlan, string(bindingMap.UID)); err != nil {


Why we need to verify the user input vlan tag?

What if this vlan tag is allocated by the controller? e.g., in round 1, controller successfully reconciles the CR, then user has made update, the resource enqueue again, then this function may break the controller to do the expected reconciliation.

wenyingd · 2026-06-04T10:54:16Z

+
+// Allocate picks an available VLAN. When preferred is valid and unused on the target, it is returned;
+// otherwise the smallest free ID from [0, 4094] is used.
+func (s *Service) Allocate(parentSubnetPaths []string, excludeCRUID string, preferred int64) (int64, error) {


Can we try a valid tag (not used) from the cache first, then query from NSX only when NSX returns error that the value is already used?

wenqiq · 2026-06-09T09:49:49Z

Could you add more local test cases that some existing NSX BindingMap already allocate some VLAN ID

Updated the E2E test cases to include this.

jianjuns · 2026-06-10T19:29:06Z

+                  unique on the target Subnet or SubnetSet. When omitted, the operator auto-allocates a VLAN ID.
                format: int64
                maximum: 4094
-                minimum: 0


Please update the commit description (it now says [0-4094]) too.

@jianjuns NSX API supports VLAN [0-4094], so do you think we should keep the same range in CRD definition including VLAN 0, but do auto-allocation from VLAN 1?

VLAN 0 maps to the PARENT (VNIC) port. How can it be used for CHILD ports?

jianjuns · 2026-06-10T19:33:35Z

+	return nil
+}
+
+func (r *Reconciler) reconcileVlanTrafficTag(ctx context.Context, bindingMap *v1alpha1.SubnetConnectionBindingMap, parentSubnetPaths []string, fromNSX bool) *errorWithRetry {


For a VLAN ext Subnet, could we first try using the Subnet's VLAN ID, when it is not used yet?

It seems there is no architectural relationship between the VLAN extension Subnet's VLAN ID and the vlanTrafficTag, correct? This aligns with previous comments that mentioned: #1445 (comment) #1445 (comment)

cc @dantingl @wenyingd

While what you said is true, from VLAN network consumer perspective - esp. for those who used to consume VLAN DVPG or SV VDS network directly, but migrate to VPC and VLAN extension - it would be much easier to keep using the same VLAN IDs in the guest.
Think about the major use case of Telco.

Good point. Updated.

wenyingd · 2026-06-11T06:22:58Z

@@ -79,14 +79,13 @@ spec:
              vlanTrafficTag:


@jianjuns could you help share your thoughts regarding how to show the "auto-allocated" vlan in the CR?

We have two options,
option 1: Re-use SubnetConnectionBindingMap CR spec.vlanTrafficTag for the allocated VLAN,
option 2: keep spec.vlanTrafficTag as empty (the same as user input), and introduce a field in status to show the final VLAN configured in NSX.

The current implementation is option1, the advantage is we don't need to update the APIs. In the meanwhile, it is impossible to know whether the VLAN is allocated by system or provided by the user (user input spec is modified). For nsx-operator logic, it may have two request to kube-apiserver : 1) update the CR first and 2) update status with the conditions.

For option 2, the advantage is the original user intent is kept, and nsx-operator only send request to apiserver to "update status". In the meanwhile, we may need to introduce a new field in status to record the final VLAN used in NSX.

Thanks @wenyingd, I`d like to add a point is that, during the reconcile process we set annotation nsx.vmware.com/auto_allocated_vlan_traffic_tag when writing the spec, this helps distinguish operator-allocated vs. user-provided VLANs. see #1445 (comment)

In my understanding, K8s API design does not reuse the Spec field to populate system generated values, while NSX Policy API has such cases. Could you also check with @tnqn on this one?

I checked, and it appears these two examples are similar to our situation, right? K8s dynamically allocates values for omitted spec fields (e.g., ClusterIP, NodePort) and writes them back to the spec.

Sure, but there can also be an argument that Service v1 API is a legacy design pattern. Could we understand the NSX Policy API pattern for such cases to decide?

But that can anyway happen since we will continue to support user specified VLAN ID in the Spec. The discussion here is just that when user does not specify a VLAN ID, should system auto-allocated VLAN ID be set in the same Spec field.

If status.vlanID is introduced, the consumer should use status.vlanID only, which resolves the above problem.

When user specify spec.vlanID, if it cannot be realized in NSX, status.vlanID will not be set.

When user specify spec.vlanID, if it can be realized in NSX, status.vlanID will be set to spec.vlanID.

When user omit spec.vlanID, status.vlanID will be set to the allocated VLAN ID.

I see what you mean now. However, even without VLAN ID in status, the consumer should check whether the resource is realized from the status before using it?

It could but it's more complex, especially if the vlan ID is mutable: consumer needs to know the realized status is for the previous ID or the current ID. There could be solutions anyway even if reusing the field in spec. It's just a recommendation to separate the user input and the actual status to make it simpler.

Make sense. When NSX supports VLAN ID auto-allocation and we also expose SubnetConnectionBindingMap in CCI, in your mind how should we define the CCI API - auto-allocated VLAN ID in Status? @tnqn

When spec.vlanTrafficTag is not set, the subnetbinding controller allocates a VLAN from [0, 4094], writes it to spec, and realizes the NSX binding map. Used VLANs are discovered via NSX search on parent Subnet paths so bindings created outside the Supervisor are included. Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>

Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>

wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 018d05f to 3443291 Compare June 2, 2026 11:09

wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 3443291 to 6ffa2b2 Compare June 2, 2026 13:12

wenqiq marked this pull request as ready for review June 2, 2026 17:48

dantingl requested review from wenyingd and yanjunz97 June 4, 2026 06:27

dantingl reviewed Jun 4, 2026

View reviewed changes

Comment thread pkg/controllers/subnetbinding/subnetbinding_controller.go Outdated

wenyingd reviewed Jun 4, 2026

View reviewed changes

wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch 3 times, most recently from 797b822 to e59068d Compare June 9, 2026 07:02

wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 6cf497f to 7f070ee Compare June 9, 2026 11:13

dantingl requested a review from jianjuns June 10, 2026 02:49

jianjuns reviewed Jun 10, 2026

View reviewed changes

wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 7f070ee to a2adbc0 Compare June 11, 2026 04:16

wenyingd reviewed Jun 11, 2026

View reviewed changes

wenqiq added 3 commits June 26, 2026 09:06

update

3f8f4ac

Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>

Move auto-allocated VLAN ID to Status in SubnetConnectionBindingMap

5751845

Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>

wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 77f8dcf to 5751845 Compare June 26, 2026 01:52

fix ut

929221e

Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>

wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from d9faf0f to 929221e Compare June 26, 2026 06:47

Uh oh!

Conversation

wenqiq commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dantingl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenqiq commented Jun 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tnqn Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wenqiq commented Jun 2, 2026 •

edited

Loading

codecov-commenter commented Jun 2, 2026 •

edited

Loading

tnqn Jun 18, 2026 •

edited

Loading